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Abstract. While there are now a number of languages and frameworks that en- 
able computer-based systems to search stored data semantically, the optimal de- 
sign for effective user interfaces for such systems is still unclear. Such inter- 
faces should mask unnecessary query detail from users, yet still allow them to 
build queries of arbitrary complexity without significant restrictions. We de- 
veloped a user interface supporting semantic query generation for SemanticOr- 
ganizer, a tool used by scientists and engineers at NASA to construct networks 
of knowledge and data. Through this interface users can select node types, 
node attributes and node links to build ad-hoc semantic queries for searching 
the SemanticOrganizer network. 


1 Introduction 

To imbue web documents with machine-readable semantic content, authors now have 
formats such as RDF for storing such content [1] and tools like Annotea and the 
SHOE Knowledge Annotator [2] to help create such content. Furthermore, standards 
for query languages to search this content are also beginning to emerge [3]. However, 
there are still very few tools to help users create semantic queries in any of these lan- 
guage, and the design of such tools remains the subject of ongoing research. 

We have developed a user interface for building semantic queries of arbitrary com- 
plexity for SemanticOrganizer 1 (SO), a combined knowledge and data repository that 
features an extensive semantic network. Through this interface a user can generate a 
complex query to search the SO knowledge space for sets of items consistent with the 
query. The queries are stored as RDF models with anonymous nodes, hidden within 
HTML pages of the interface, and incrementally updated as the user builds a query. 

We approached the design of this interface with the twin goals of accommodating 
users who know nothing or very little about RDF and presenting the queries in a sim- 
ple, straightforward manner. SO has a wide array of users who vary in technical 


1 http://sciencedesk.arc.nasa.gov/ 
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Figure 1. Flow of user interaction with the SemanticOrganizer query building interface 

savvy and who use a variety of computing platforms and software. By and large, 
most user interaction with SO is via HTML forms, and we have eschewed more so- 
phisticated interfaces such as specialized java applet widgets largely because of cross- 
platform/browser compatibility issues. Thus, we sought to develop a semantic search- 
ing interface using only HTML technology. 


Methods 


Because the task of building all but the simplest query can require substantial cogni- 
tive reasoning on the part of users, we chose a successive refinement design for the 
query building interface (Fig. 1). Users iteratively add “terms” to a query; each term 
is represented as a typed, but otherwise anonymous node in the RDF model. Each 
node is added by linking it to a node already in the model through a “link” type prop- 
erty selected from the SO knowledge network. The query can be submitted for execu- 
tion any time after the first node is created and added to the model. In fact, the user 

can continue to refine the query and/or 
submit it for execution even after 
search results are presented. 

Figures 2 through 7 show the 
development of a query to search for all 
DNA sequences from any bacterial 
culture of a (stromatolite) sample with 
certain properties. Figure 2 shows a 
user beginning to build a query using 
the interface. The interface is separated 
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Figure 2. The Node Type Selection display of 
the query builder interface. The user must se- 
lect a type for the first (and each successive) 
node in the query 
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by a simple horizontal line into an upper query building area, in which users select 
and edit terms in the query, and a lower query execution area, in which users can 
choose to submit a query, view search results, or erase the current query and begin 
again. Because the query (in its current state) is stored client-side (i.e., embedded 
within the web page) and not server-side, the user can “back up” to previous versions 
of the query at will using only their browser’s navigation buttons and pursue different 
paths of query refinement. 

As shown in Figure 2, the user begins to build the query by selecting the type 
“DNA Sequence” for a new (anonymous) node in the model, labelled “DNA Se- 
quence 1.” All nodes are typed and so labeled by order of creation. Specifying the 
type for a node in the query adds a statement to the RDF model that restricts the type 
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Figure 3. The Edit Node Properties step. After selecting a type for the new node, the user is pre- 
sented with a form to select/edit literal property values (upper right) and/or choose a “link” type 
property (lower right) to connect the new node to another node 
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property of the node to 
the appropriate class. 

The interface requires 
the user to select a type 
for each node before any 
of the node’s properties 
can be defined. While 
this design choice ini- 
tially followed logically 
from the types of que- 
ries we solicited from 
potential users (e.g., 

“Find all experi- 
ments...”, “Find all 
samples...”, etc.), it also 
obviated the need to de- 
velop methods for users to sort through the dozens or even hundreds of possible prop- 
erties defined on all types in a given domain. Instead, the interface only needs to dis- 
play those properties whose domain is the type of node selected. 

After the user chooses the type of node to be added to the model, the interface dis- 
plays the Edit Node form (Fig. 3). This form allows the user to enter or select literal- 
valued properties of the node, or select from a list of properties that have other nodes 
as ranges to link this node to other nodes in the model. Literals can be specified by 
entering them directly or selecting from a list of allowed values (if such a list is de- 
fined for the property type) and submitting the form; the returned page displays the 
values along with an adjacent “scissors” icon which can be used to submit the form 
again, this time removing the value. Values for any number of literal properties may 
be submitted all at once or in any sequence as many times as desired. However, once 
the user selects a “link” type property and submits the form, the interface requires the 
user to specify a class for the range of this property (Fig. 4). After the user has se- 
lected a range type, a new node (“Culture 1”) is added to the model, as well as a 
statement restricting its type to the type specified, and a statement linking the two 

nodes through the se- 
lected property. This 
action returns the user to 
the Node listing Dis- 
play, showing the two 
newly created anony- 
mous nodes along with 
the list of all node types 
(Fig. 5). 

At this point the user 
can either select one of 
the existing nodes in the 
model (to add other 
Figure 4. The query after one round of node editing. The user is and/or property 

proceeding to shape the query by selecting the anonymous node va ] ues ) or c hoose the 
“Culture 1” as the next node to edit 
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Figure 6. The final complex query to search DNA sequences from any culture cultivated from 
a sample of Cyanobacteria showing lithification 

type for a third new node in the model. He or she can continue the cycle of creating 
and editing existing nodes at will until satisfied with the query. This cycle could pro- 
duce, for example, the complex query shown in Figure 6. 

In the query execution area of the interface, we display generated queries in tabu- 
lar form, which is well-supported in HIML. Each node in the query is assigned a 
corresponding column in the tabular display, and each row displays one or more links 
between nodes. While this format may not be concise, it is probably superior to 
merely listing the nodes and links of the model. 

We could have designed the interface such that users could create any type of 
graph structure, including those with cycles. However, the use of HTML tables to 
display queries with cycles clearly and unambiguously appeared very challenging, if 
not impossible. Thus, we chose not to allow users to generate cyclical query struc- 
tures using this initial version of the query-building interface. 

At any time during the process of building the complex query, the user may choose 
to completely erase the query through the “Clear Query” button or execute the query 
by pressing the “Perform Search” button. Choosing to erasing the query removes the 
RI)F model embedded in the page and_ returns the user to the first step in the query 
building process (see Figure 2). 

To execute the query, we viewed searching the SO knowledge space using the gen- 
erated query as a constraint satisfaction problem (CSP) (as others have); the nodes in 
the query represent the set of variables in the CSP, the items in SO correspond to the 
domain of possible values for these variables, and the various properties in the RDF 
model that the user specifies represent the constraints. We developed procedures to 


6 


Daniel C. Berriosl, Richard M. Keller2 


3 search results matching your query: 


X- ' T’” T.-rr 


RL Sequence 1 

^Lculture 1 Stromatofite Sample r 


i : 

HBC-l: 

rKNA sequence 
16S rRNA seauence 

Hb'l-3 

HBC-l 

bb UU1 
SB- 001 


15S rRNA seauence's''! 

HGC-3 

SB-001 


Figure 7. Example search results. Each row represents a set of pos- 
sible values for each node in the model 

and the possible sets of values for the nodes are listed as rows, 
lar value shows the item in SO. 


solve this CSP using 
common program- 
ming techniques to 
increase efficiency, 
including node and 
arc -consistency 
tracking. 

Figure 7 shows 
the search results for 
the query shown in 
Figure 6. Each node 
in the query corre- 
sponds to a column 
in the results table. 
Clicking on a particu- 


Discussion 

We present our experience developing a complex query generation interface that we 
hope will be effective and at the same time intelligible to naive web users. The mis- 
sions and scientific activities conducted at NASA often involve users with a wide va- 
riety of sophistication in computer science and experience with computing tools. Yet 
even unsophisticated users have advanced information needs that will require them to 
be able to specify complex queries. 

We will extend the functions of the interface to include more features that users 
will likely require, such as selecting and searching for multiple values for literal- 
valued properties (using Boolean OR), specifying ranges of values for special types of 
literals such as dates and times, and range sets for link-type properties. Because 
building some queries often requires significant time and thought, we are also devel- 
oping methods for users to store, retrieve, re-edit and re-execute complex queries. 
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